Digital In-Context Experiments (DICE)

We need a new subtitle that does not exclusively focus on validity.

Authors
Affiliations

Hauke Roggenkamp

Institute of Behavioral Science and Technology, University of St. Gallen

Johannes Boegershausen

Rotterdam School of Management, Erasmus University

Christian Hildebrand

Institute of Behavioral Science and Technology, University of St. Gallen

Published

Friday Aug 9, 2024, 10:19 GMT+2

Case Studies

The following case studies demonstrate the practical application and novel capabilities of DICE. They not only showcase the tool in action but also highlight its key contributions, particularly by manipulating entire feed contexts and in measuring dwell time. By presenting these studies, we aim to provide a blueprint for researchers interested in adopting DICE for their own studies The first case study illustrates the tool’s capacity for manipulating and controlling entire feed contexts whereas the second focuses on measuring participant engagement through dwell times. Together, these studies exemplify how our tool can enhance ecological validity while maintaining high levels of internal validity as discussed above.

Context Matters: Evaluating Brand Safety in Social Media Advertising

Brand safety refers to strategies and measures ensuring that a brand’s content, particularly advertisements, does not appear in contexts that could harm the brand’s reputation (see, e.g., Bellman et al. 2018; Lee, Kim, and Lim 2021; Hemmings 2021). These measures are especially crucial in social media, where platforms use automated systems to place ads in dynamic, rapidly changing, and user-generated content environments. Such automated systems often lack the nuanced understanding that humans possess, which can lead to ad placements in contexts that seem appropriate at first glance but are ultimately unsuitable. In our hyper-connected world, such of misplacement can rapidly propagate, potentially magnifying reputational damage beyond the initial exposure (Swaminathan et al. 2020). Accordingly, Ahmad et al. (2024) found that most brand managers have a strong preference to avoid misplacement and Schmitt (1994, 1986) quotes an expert saying “Advertisers [do] not want to display their products between battle scenes.” This is also reflected in an industry report that not only states that about 70% of brands take brand safety seriously but also that 75% of the interviewed brands report brand-unsafe exposures (GumGum Inc. 2017).

Brands typically consider juxtapositions with hate speech, pornography, and violence as the most egregious violations of brand safety. To mitigate such risks, brands and platforms commonly employ blacklists and negative targeting strategies, defining keywords and publishers associated with these topics to avoid undesirable ad placements. On X (formerly Twitter), for instance, brand managers can utilize adjacency controls, allowing them to specify up to 1,000 negative keywords to regulate the content appearing above and below their ads in users’ timelines. While these measures have proven relatively effective in preventing placements alongside the most brand-unsafe content, misplacements adjacent to disasters, tragedies, divisive political content, and misinformation remain prevalent and, in part, unnoticed: Ahmad et al. (2024) find that most decision-makers are unaware that their companies’ advertising appears on misinformation websites. This persistent challenge may be attributed to the inherent difficulty in accurately classifying and identifying fake news, subtle forms of divisive content, and emerging crisis situations in real-time. GumGum Inc. (2017) reported that 39% of sampled brands experienced their content being displayed adjacent to at least one of these problematic topics.

To illustrate the unique capabilities of DICE, we propose a simple study that extends beyond altering individual posts to modifying entire feeds: Unlike traditional online platform studies, we hold the ad copy and creative constant while manipulating the surrounding context between-subjects. Importantly, this study design is uniquely feasible within the DICE paradigm due to its precise control over the contextual environment—a capability not available in other research methodologies such as vignette studies. This level of control is crucial when examining brand safety, a phenomenon inherently defined by an advertisement’s context. By manipulating the surrounding content while keeping the ad constant, we can directly investigate how context impacts brand perceptions, offering insights into brand safety that would be challenging to obtain through alternative research approaches.

We test the intuitive hypothesis that an inappropriate (compared to a more general) context negatively affects brand attitudes. To better understand whether the effect is also driven by implicit memory effects (Schmitt 1994), we control for cued and uncued recall.

Experimental Design

Our study focuses on scenarios where airlines promote travel destinations through targeted advertising, placing ads in contexts that align with specific destinations. Given that major airlines serve numerous destinations globally, these ad placements are typically managed through automated programmatic systems. We leverage this automated placement approach to create two hypothetical scenarios featuring KLM Royal Dutch Airlines (KLM) promoting flights to Brazil. At the time of the study, Brazil was experiencing severe flooding that claimed at least 95 lives (Buschschlüter 2024). To simulate real-world conditions, we scraped real tweets and assembled them to two distinct Twitter feeds: one covering the natural disaster and another featuring more general content, including coverage of Madonna’s free concert in Rio de Janeiro. This experimental design allows us to examine the impact of contextual advertising in varying circumstances, including during times of crisis.

For the illustrative character of this study, we assumed that automated placement systems primarily target the keyword “Brazil” without considering nuanced contextual factors. This assumption allowed us to simulate how the same advertisement might appear in markedly different contexts on a social media platform. Consequently, we placed an identical fictitious sponsored post by KLM, promoting flights to Brazil, into both Twitter feeds. The advertisement features a creative (as shown in Figure 1) as well as copy that read: “Brazil’s wild beauty calls! Experience nature like never before. Book your breathtaking adventure with KLM.” While this messaging would typically be considered appropriate for tourism promotion, it appears strikingly insensitive when juxtaposed against news of a natural disaster. [Shall we pre-test this assumption?]

Figure 1: KLM Ad Creative

Method: Participants read instructions and browsed one of two twitter feeds (flooding-related vs. general) [formerly called unsafe and safe, respectively] in which we placed the KLM ad, before they were directed to a Qualtrics survey. Our stimuli, that is, the two feeds consisted of 20 real tweets each and placed the focal ad (by KLM) in fifth position.1 In the survey, we elicited whether participants recall a brand advertising in the feed—first uncued and then cued (i.e., participants saw a list of a diverse range of brands and had to indicate whether they recall seeing them). Next, participants evaluated the target brand on three seven-point scales presented in a random order (1 = “Negative/Unfavorable/Dislike” and 7 = “Positive/Favorable/Like”), which we averaged into a single measure. Finally, participants indicated whether they were aware of the flooding, provided demographic information, read a debriefing and were redirected to Prolific.

Participants: We recruited 299 US-American participants (\(M_{age} = 37\) years; 49% female) from Prolific. All participants who started the experiment and read the instructions (N=317), submitted the social media feed. 299 finished the qualtrics survey. Of these 299 participants, 111 have been assigned to the inappropriate condition. We do not observe selective attrition and are confident that the group assignment was indeed random—an assumption that is generally accepted in vignette studies, but cannot be presumed in observational studies. Table 1 demonstrates that the two treatment groups do not exhibit significant differences in observables. Nevertheless, the unsafe condition tends to skew slightly younger, as indicated by the second column group.

Table 1:

Balance Across Conditions

Characteristic Female Age
Beta 95% CI1 p-value Beta 95% CI1 p-value
condition





    safe

    unsafe 0.04 -0.08, 0.15 0.5 -2.6 -5.3, 0.11 0.060
1 CI = Confidence Interval

Implementation

We implemented the two cell between-subjects design by creating a csv file that contains two times twenty rows (i.e., twenty tweets for each condition). Whereas all other columns are unique, two of these rows represent one and the same sponsored post, which we simply duplicated before we assigned it to each of the two conditions. To specify the spondered posts, we set sponsored to 1, provided a landing page in the target column to which participants are directed when clicking on the ad. In addition, we set its sequence parameter to 5 to guarantee that it is displayed in fifth position of the feed. We did not specify that parameter for any other tweet such that DICE orders the remaining tweets randomly between-subject. Finally, we added a source column that provides URLs to the tweets we scraped. Even though this column is not required (as DICE does not evaluate it) we considered such a column useful for documentation purposes. The described csv file, whose structure we illustrate in Table 2, was then uploaded to Github such that we can pass the corresponding URL to the DICE app.

Table 2: CSV Exerpt
doc_id text username condition sponsored target sequence
1 Madonna breaks the record for biggest audience… chart data appropriate 0
2 Saudades do Rio 🫶🏼

didn’t want to leave…
diplo appropriate 0
3 50 million people watched on TV Madonna… Madonna Daily appropriate 0
4 Chelsea really wanted Real Madrid-bound… Nizaar Kinsella appropriate 0
5 Brazil’s wild beauty calls! Experience nature… KLM appropriate 1 [KLM url] 5
25 Brazil’s wild beauty calls! Experience nature… KLM inappropriate 1 [KLM url] 5
40 i mentioned this on another tweet! if you can help… Evil Scientist inappropriate 0

Results

Brand attitude. As pre-registered, we conduct a simple OLS regression where the inappropriate feed (\(M_u = 3.56\)) resulted in significantly less favorable brand evaluations than the more appropriate feed for those who recalled seeing a KLM ad without a cue (\(M_s = 5.29\), \(F(1, 33) = 11.85\), \(p = 0.002\), \(\text{Cohen's d} = 1.18\)).

This also holds for participants who needed a cue to recall the ad (\(M_u = 3.92\), \(M_s = 4.82\), \(F(1, 61) = 4.54\), \(p = 0.037\), \(\text{Cohen's d} = 0.59\)).

Importantly, we also observe the effect for those participants who do not recall seeing a KLM ad at all (\(M_u = 4.05\), \(M_s = 4.49\), \(F(1, 198) = 8.51\), \(p = 0.004\), \(\text{Cohen's d} = 0.42\)).

Figure 2: Effect of Misplaced Ad on Brand Evaluations

We illustrate all three effects in panel A, B and C of Figure Figure 2, respectively. Even though the distributions in panel C accumulate more density at the their center, there still is a significant difference between both groups, which indicates an implicit memory effect: even though participants do not recall seeing an KLM ad, the (forgotten) exposure in the inappropriate context dilluted the brand’s reputation.

[This raises the question of whether dwell time, an implicit measure, may explain such implicit processes. What would be the theoretical argument? More attention?]

Legacy

Recall. [If we use this study to demonstrate the context-contribution, we don't need dwell time analyses here.] A logit regression provides correlational evidence which indicates that an additional second in the viewport increases the odds of recall by about 3% holding other factors constant (\(p = 0.01\)). Controlling for the experimental condition this value changes only slightly2. The interaction’s large standard error in Model 2 presented in Table 3 indicates that this correlation is robust across conditions.

Table 3:

Logit Results

Characteristic Aided Recall
OR1 p-value OR1 p-value OR1 p-value OR1 p-value
seconds_in_viewport 1.03 0.008 1.05 0.025



condition

1.72 0.13

1.84 0.12
seconds_in_viewport * condition







    seconds_in_viewport * unsafe

0.97 0.2



relative_dwell_time



24.1 0.091 268 0.057
relative_dwell_time * condition







    relative_dwell_time * unsafe





0.01 0.3
1 OR = Odds Ratio

[Because we analyze dwell time in the next study in more detail, we should use it here to exclude participants who have not paid attention to the ad: either exclude them or control for dwell time on focal post.]

Dwell Time Case

Lorem ipsum.

References

Ahmad, Wajeeha, Ananya Sen, Charles Eesley, and Erik Brynjolfsson. 2024. “Companies Inadvertently Fund Online Misinformation Despite Consumer Backlash.” Nature 630: 123–31. https://doi.org/10.1038/s41586-024-07404-1.
Bellman, Steven, Ziad H. S. Abdelmoety, Jamie Murphy, Shruthi Arismendez, and Duane Varan. 2018. “Brand Safety: The Effects of Controversial Video Content on Pre-Roll Advertising.” Heliyon 4 (12): e01041. https://doi.org/10.1016/j.heliyon.2018.e01041.
Buschschlüter, Vanessa. 2024. “Brazil Floods: ’We’ve Never Experienced Anything Like It’.” BBC News. https://web.archive.org/web/20240805170342/https://www.bbc.com/news/articles/cle07g0zzqeo.
GumGum Inc. 2017. “The New Brand Safety Crisis: A Fractured Environment.” GumGum. https://web.archive.org/web/20220317063148/https://insights.gumgum.com/hubfs/Brand_Safety_GumGum.pdf.
Hemmings, Mike. 2021. “Ethical Online Advertising: Choosing the Right Tools for Online Brand Safety.” Journal of Brand Strategy 10 (2): 109–20. https://www.ingentaconnect.com/content/hsp/jbs/2021/00000010/00000002/art00003.
Lee, Chunsik, Junga Kim, and Joon Soo Lim. 2021. “Spillover Effects of Brand Safety Violations in Social Media.” Journal of Current Issues and Research in Advertising 42 (4): 354–71. https://doi.org/10.1080/10641734.2021.1905572.
Schmitt, Bernd H. 1994. “Contextual Priming of Visual Information in Advertisements.” Psychology & Marketing 11 (1): 1–14. https://doi.org/10.1002/mar.4220110103.
Swaminathan, Vanitha, Alina Sorescu, J. B. E. M. Steenkamp, Thomas C. O’Guinn, and Bernd H. Schmitt. 2020. “Branding in a Hyperconnected World: Refocusing Theories and Rethinking Boundaries.” Journal of Marketing Research. https://doi.org/10.1177/0022242919899905.

Footnotes

  1. You can browse the flooding-related feed here and the more general feed here.↩︎

  2. An additional second in the viewport increases the odds of recall by about 1.05% (\(p = 0.02\))↩︎